continuous control
- North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
- North America > United States > California > Santa Clara County > Cupertino (0.04)
- Asia > Middle East > Jordan (0.04)
- Workflow (0.46)
- Research Report > New Finding (0.46)
- North America > United States > Illinois (0.05)
- Asia > Middle East > Jordan (0.04)
- North America > United States > New Jersey (0.04)
- (4 more...)
- North America > Canada (0.04)
- Asia > Middle East > Jordan (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > Middle East > Jordan (0.05)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.27)
- North America > Canada > Quebec > Montreal (0.04)
- Asia > Middle East > Jordan (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)
- (2 more...)
Reduced Policy Optimization for Continuous Control with Hard Constraints
Recent advances in constrained reinforcement learning (RL) have endowed reinforcement learning with certain safety guarantees. However, deploying existing constrained RL algorithms in continuous control tasks with general hard constraints remains challenging, particularly in those situations with non-convex hard constraints. Inspired by the generalized reduced gradient (GRG) algorithm, a classical constrained optimization technique, we propose a reduced policy optimization (RPO) algorithm that combines RL with GRG to address general hard constraints.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.92)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.84)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.59)
Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control
Deep reinforcement learning agents for continuous control are known to exhibit significant instability in their performance over time. In this work, we provide a fresh perspective on these behaviors by studying the return landscape: the mapping between a policy and a return. We find that popular algorithms traverse noisy neighborhoods of this landscape, in which a single update to the policy parameters leads to a wide range of returns. By taking a distributional view of these returns, we map the landscape, characterizing failure-prone regions of policy space and revealing a hidden dimension of policy quality. We show that the landscape exhibits surprising structure by finding simple paths in parameter space which improve the stability of a policy. To conclude, we develop a distribution-aware procedure which finds such paths, navigating away from noisy neighborhoods in order to improve the robustness of a policy. Taken together, our results provide new insight into the optimization, evaluation, and design of agents.